<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <link rel="StyleSheet" href="css/style.css" type="text/css">
  <meta content="text/html; charset=ISO-8859-1"
 http-equiv="content-type">
  <meta name="verify-v1"
 content="sRQSq4VA5FRMhwzFB4U3I9AtgLMtIWTdpVVO6jg1az4=">
  <title>Atomserver, An Introduction</title>
  <script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
  </script>
  <script type="text/javascript">
var pageTracker = _gat._getTracker("UA-4603527-2");
pageTracker._initData();
pageTracker._trackPageview();
  </script>
</head>
<body class="maincontent">
<h1><img style="width: 75px; height: 75px;" src="images/atom-logo.gif"
 alt="logo">&nbsp;<span style="font-weight: bold;"></span>AtomServer,
An Introduction</h1>
<div class="content"><span style="font-style: italic;"><br>
Chris Berry, Bryon Jacob. Updated 05/01/08</span><br>
<h2>Introduction</h2>
AtomServer is a normalized, universal data store, implemented as a
REST-ful web service, and designed as an <span
 style="font-style: italic;">Atom Store</span>. AtomServer is based on
the
following concepts and protocols, which are explained in detail below.<br>
<ul>
  <li><a href="#REST">REST</a> </li>
  <li><a href="#Atom">Atom</a></li>
  <li><a href="#opensearch">Additional Paging Specs</a> </li>
  <li><a href="#atomstore">The Atom Store</a></li>
</ul>
Parts of this document were culled from books and the Internet. These
sources are <a href="#ack">acknowledged below</a>.<br>
<h2><a name="REST"></a>REST</h2>
There is nothing all that remarkable about REST (Representational State
Transfer). It is not some new amazing technology. REST is really just
the realization that every web application - every web site - is, in
fact, a service. And that these "web services" can scale to enormous
size, while delivering an unparalleled loose coupling between the
client and the service.&nbsp; It is probably more remarkable that it
took us so long to apply the principles of the web to distributed
computing. But old habits die hard. <br>
<br>
It is important to understand that <span style="font-style: italic;">REST
is a design pattern</span>. It's not a technology like SOAP or
HTTP.&nbsp; Instead, REST is a novel technique to decompose an
application into smaller parts, so that the whole works better in a
distributed setting.&nbsp; REST is a proven design pattern for building
loosely-coupled, highly-scalable applications. There are important
benefits to sticking to the REST design pattern;<br>
<ul>
  <li><span style="font-weight: bold;">Simple.</span> REST is
incredibly simple to define. There are just a handful of principles and
well defined semantics associated with it.</li>
  <li><span style="font-weight: bold;">Scalable.&nbsp;</span> REST
leads to a very scalable solution by promoting a stateless protocol and
allowing state to be distributed across the web.</li>
  <li><span style="font-weight: bold;">Layered. </span>REST allows any
number of intermediaries, such as proxies, gateways, and firewalls.
Similarly, because REST is ultimately just a web site, albeit one that
adheres to a design pattern, one can easily layer aspects such as
Security, Compression, etc. on an as needed basis, as they would with
any web site.<br>
  </li>
</ul>
The basic technologies of REST are the basic technologies of the web;
the HTTP protocol, the URI naming standard, and the XML markup
language. It is the simplicity of these combined technologies that
gives REST its unprecedented power.&nbsp; REST is not collapsing under
the complicated morass of protocols and standards (SOAP, WSDL, XSD,
WS-* , ...)&nbsp; that are smothering "Big Web Services".&nbsp; This
quote from John Gall sums it up perfectly; <br>
<br>
<div style="margin-left: 80px;"><span style="font-style: italic;">"A
complex system that works is invariably found to have evolved from a
simple system that worked."<br>
</span></div>
<br>
So why is it called Representational State Transfer? The explanation is
as follows; the web is comprised of <span style="font-style: italic;">resources</span>.
A resource is any item of interest defined by some URL. For example,
a website may define a <span
 style="font-style: italic; font-weight: bold;">resource</span> for a
particular Order with, say, this URL:<br>
<br>
&nbsp;&nbsp;&nbsp; <span style="font-family: monospace;">http://www.foo.com/orders/12345</span><br>
<br>
And when the client accesses that URL, a <span
 style="font-style: italic; font-weight: bold;">Representation</span>
of the resource is returned (e.g., <span
 style="font-family: monospace;">12345.en.xml</span>). The
representation places the client application in a <span
 style="font-weight: bold;">State</span>. If the client traverses some
other hyperlink embedded in <span style="font-family: monospace;">12345.en.xml,
</span>then another resource is accessed. And the new representation
places the client application into yet another state. Thus, the client
application changes (<span style="font-weight: bold;">Transfer</span>s)
state with each resource representation. Which you can see yields the
term; <span style="font-style: italic;">Representational State Transfer</span>.<br>
<br>
The fundamental principles of the REST design pattern are; <br>
<ul>
  <li><span style="font-weight: bold;">Everything is a Resource.</span>
A resource can be thought of a distant thing that you can interact
with, but do not manipulate directly.&nbsp; </li>
  <li><span style="font-weight: bold;">Resources have names.</span>
Every resource has at least one name. This name must be unique and can
only refer to one resource. On the web, URIs are perfect match for this
requirement.&nbsp; They enable a resource to be found by any
application.</li>
  <li><span style="font-weight: bold;">Resources support simple verbs.</span>
Every resource is interacted with using a universally predefined set of
"verbs".&nbsp; These verbs are not defined by individual resources, but
    <span style="font-style: italic;">across all resources</span>.&nbsp;
On the web, the verbs are the standard HTTP methods <span
 style="font-family: monospace;">POST, GET, PUT</span>, and <span
 style="font-family: monospace;">DELETE</span>.&nbsp; Each verb has
clearly defined semantics that can be relied upon.&nbsp; This is
critical so intermediaries can do the right thing.</li>
  <li><span style="font-weight: bold;">Resources have Representations.</span>
Since resources can be thought of far away things, it is not possible
to manipulate the resource directly.&nbsp; Instead, through
communication, we can exchange representations for the resource.&nbsp;
The representation can be intended for human interaction (e.g. a HTML
page) or for machine-to-machine interaction (e.g. XML or even binary
data).</li>
</ul>
<h2><a name="Atom"></a>Atom </h2>
Fundamentally, <a
 href="http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-04.html">Atom</a>
is an XML vocabulary for describing lists of timestamped entries. These
entries can be anything, although because Atom was originally conceived
to replace RSS (Rich Site Summary) the entries often contain human
authored text, such as a weblog or news site. Thus, the internal
structure of an Atom entry (i.e. the XML elements and attributes)
conveys the semantics of publishing; authors, languages, titles, and so
on. <br>
<br>
In fact, the entire idea behind web syndication (i.e. RSS and Atom) is
lifted from the world of publishing.&nbsp; In the world of newspapers,
a syndicate (e.g. the Associated Press) distributes information to
subscribers, allowing each publication to tailor the content of
the information it receives. Comics, news stories, and opinion columns
often
are distributed by syndicates, providing greater exposure for the
authors and more content for the readers. Web syndication is identical;
web content is distributed to subscribers (as feeds) who tailor it to
their needs, often aggregating it and providing it (as aggregated
feeds) to further downstream subscribers.<br>
<br>
Atom, like RSS, provides the basis for a web syndication framework. The
Atom Publishing Protocol (APP) leverages the work done on the Atom
Syndication Format and the basics of HTTP to form a simple, yet
powerful, publishing protocol. It is useful because so many web
services are, in a broad sense, ways of publishing information.
Furthermore, there are a large number of web service clients out there
that understand the semantics of Atom documents. This includes both
browser-based clients, as well as reusable clients written in
practically every computer language you might imagine.<br>
<br>
Atom lists are <span style="font-style: italic;">feeds</span>, and the
items in the lists are <span style="font-style: italic;">entries</span>.
Atom is a <span style="font-style: italic;">RESTful</span> protocol.
These days, most weblogs, wikis, news
services, etc. expose a special resource whose representation is an
Atom feed. The entries in the feed describe and link to other
resources; weblog or wiki entries or news stories published on the
site. The client can consume these resources with a feed reader or some
other external program.&nbsp; An example Atom feed might look
something like this; <br>
<br>
<div style="margin-left: 80px;"><span style="font-family: monospace;">&lt;?xml
version="1.0"?&gt;</span><br style="font-family: monospace;">
<span style="font-family: monospace;">&lt;feed
xmlns="http://purl.org/atom/ns#"&gt;<br>
&nbsp; &lt;link rel="alternate" href="http://example.com/MyBlog"/&gt;<br>
&nbsp; &lt;updated&gt;2007-04-14T20:00:39Z&lt;/updated&gt;<br>
&nbsp; &lt;author&gt;Chris Berry&lt;/author&gt; &nbsp; <br
 style="font-family: monospace;">
</span><span style="font-family: monospace;">&nbsp; &lt;title&gt;</span><span
 style="font-family: monospace;">My Weblog</span><span
 style="font-family: monospace;">&lt;/title&gt;<br>
&nbsp; &lt;id&gt;urn:1aaabbbb-2ccc-3ddd-4567ffffffff&lt;/id&gt;<br
 style="font-family: monospace;">
</span><span style="font-family: monospace;">&nbsp; &lt;entry &gt;</span><br
 style="font-family: monospace;">
<span style="font-family: monospace;">&nbsp;&nbsp;&nbsp; &lt;title&gt;</span><span
 style="font-family: monospace;">First Post</span><span
 style="font-family: monospace;">&lt;/title&gt;</span><br
 style="font-family: monospace;">
<span style="font-family: monospace;">&nbsp; &nbsp; &lt;link rel="edit"
href="http://example.com/MyBlog/1234"/&gt;<br>
&nbsp; &nbsp; &lt;updated&gt;2007-04-14T20:00:39Z&lt;/updated&gt;<br>
</span><span style="font-family: monospace;">&nbsp; &nbsp;
&lt;id&gt;urn:3bbbcccc-2ccc-1ddd-1234ffffffff&lt;/id&gt;<br>
&nbsp;&nbsp;&nbsp; &lt;summary&gt;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Details about my summer vacation<br>
&nbsp;&nbsp;&nbsp; &lt;/summary&gt;<br style="font-family: monospace;">
</span><span style="font-family: monospace;"></span><span
 style="font-family: monospace;"></span><span
 style="font-family: monospace;">&nbsp; &lt;/entry&gt;</span><br
 style="font-family: monospace;">
<span style="font-family: monospace;"></span><span
 style="font-family: monospace;">&lt;/feed&gt;</span><br
 style="font-family: monospace;">
<br style="font-family: monospace;">
</div>
In this example you can see some of the tags tags that convey the
semantics of publishing; <span
 style="font-weight: bold; font-family: monospace;">author, title,
link, updated</span>, and so on. The <span
 style="font-weight: bold; font-family: monospace;">feed</span> as a
whole contains an <span
 style="font-weight: bold; font-family: monospace;">author</span>, and
since the <span style="font-weight: bold; font-family: monospace;">entry</span>
does not, it inherits the <span style="font-weight: bold;"><span
 style="font-family: monospace;">author</span> </span>information from
the <span style="font-weight: bold; font-family: monospace;">feed</span>.
The <span style="font-weight: bold; font-family: monospace;">feed</span>
has a <span style="font-weight: bold; font-family: monospace;">link</span>
that presumably points to an alternate URI for the underlying feed
resource. The <span style="font-weight: bold; font-family: monospace;">entry</span>
also has a <span style="font-weight: bold; font-family: monospace;">link</span>,
which identifies the <span
 style="font-weight: bold; font-family: monospace;">entry</span> as a
resource in its own right. The <span
 style="font-weight: bold; font-family: monospace;">entry</span>
contains a <span style="font-weight: bold; font-family: monospace;">summary</span>,
which the feed reader would most likely expose to the user. Presumably,
to get the full blog, the user must subsequently <span
 style="font-family: monospace;">GET</span> the <span
 style="font-weight: bold; font-family: monospace;">entry</span>'s URI.
<br>
<br>
An Atom document is basically a listing of published resources. You can
use Atom to represent practically <span style="font-style: italic;">any</span>
published resource - a set of purchase orders, images, a list of
search results, whatever. Or you can omit the <span
 style="font-family: monospace; font-weight: bold;">link</span>&nbsp;
element in the <span style="font-weight: bold; font-family: monospace;">entry</span>
and use Atom as a container for the original content. <br>
<br>
APP is all about pushing around Atom entries. And it is important to
note that entries, like feeds<span
 style="font-weight: bold; font-family: monospace;"></span>, are first
class citizens within the protocol. Each <span
 style="font-family: monospace; font-weight: bold;">entry</span> has a
corresponding representation, and thus, each <span
 style="font-weight: bold; font-family: monospace;">entry</span> has a
corresponding URI that represents it.&nbsp; <br>
<br>
Atom is, by definition, RESTful. Do an HTTP <span
 style="font-family: monospace;">GET</span> on that URI to retrieve an
entry representing the underlying data; <span
 style="font-family: monospace;">PUT</span> a new <span
 style="font-family: monospace; font-weight: bold;">entry</span> to the
URI
to update the represented data. HTTP <span
 style="font-family: monospace;">DELETE</span> on that URI and the
represented data is deleted. The entries that are used to represent the
data are
grouped together in a <span
 style="font-weight: bold; font-family: monospace;">collection</span>.
That, too, is a resource and has its
own URI. <br>
<br>
Atom feeds and entries must contain certain elements. For example, an <span
 style="font-family: monospace;">updated </span>element, which may be
used along with the standard HTTP Headers <span
 style="font-family: monospace;">If-Modified-Since</span> and <span
 style="font-family: monospace;">If-Unmodified-Since</span> - or
alternately with ETags - to provide a mechanism to return only the
entries that have changed. Obviously, this can yield a significant
performance improvement. Likewise, Atom can make use of the standard
HTTP <span style="font-family: monospace;">Cache</span> Headers to
provide further performance enhancements.<br>
<br>
If an application doesn't quite fit the Atom schema, it is possible to
embed XML tags from other namespaces in a Atom feed. It is even
possible to define a custom namespace and embed its elements in your
Atom feeds. Clients that do not understand your special elements will
see a normal Atom feed with some mysterious data in it, which it is
required to simply ignore. <br>
<br>
Atom is an important addition to a RESTful architecture.&nbsp; It
provides a standard for both program control, and for error processing.
By definition, a client knows exactly how
to interact with Atom and what will happen (error codes, etc.) when
things go wrong. This makes it easy to write generic clients, and to
mash together disparate Atom feeds into something greater than the
individual feeds alone.<br>
<br>
Atom also provides a mechanism for assigning <span
 style="font-weight: bold; font-family: monospace;">categories</span>
to Entries. This is a very powerful concept. It essentially allows
Clients to extend the original content, making it much richer, yet
leaving it untouched.<br>
<h2><a name="opensearch"></a>Additional Paging Specs<br>
</h2>
When a client requests a feed of all entries that have changed since
some particular date, they are essentially doing a time-sensitive
search, where the search parameter is the <span
 style="font-family: monospace;">If-Modified-Since</span> date.&nbsp;
It is not hard too imagine that a feed request like this might produce
a huge set of results. In order to save bandwidth, and to avoid
overwhelming the client with possibly irrelevant data, it is common to
divide large feeds into successive "pages", giving the user the ability
to chain through the pages as they wish (e.g. Google). <br>
<br>
Atom does not address the problem of "paging", so a couple of Internet
specifications have emerged. <a href="http://www.opensearch.org/Home">OpenSearch</a>
is one XML
vocabulary that's sometimes embedded in Atom documents. OpenSearch is a
Creative-Commons-licensed specification, that was created by Amazon in
2003. OpenSearch defines a RESTful protocol for searching, including a
format for
advertising what kind of search your site supports, and specifying how
to return your search results in Atom or RSS. An OpenSearch-enabled web
service returns the results of a query as an Atom feed, with the
individual results represented as Atom entries.<br>
<br>
Some aspects of a list of search results cannot be represented in a
standard Atom feed. So OpenSearch defines three new elements in the
opensearch namespace:<br>
<ul>
  <li><span style="font-family: monospace;">totalResults</span> &nbsp;
The total number of results that matched the query</li>
  <li><span style="font-family: monospace;">itemsPerPage</span>&nbsp;&nbsp;
How many items are returned in a single "page" of search results.</li>
  <li><span style="font-family: monospace;">startIndex&nbsp;</span>&nbsp;
If all search results are numbered from zero to <span
 style="font-family: monospace;">totalResults,</span> the the first
result in this feed document is entry number <span
 style="font-family: monospace;">startIndex</span>. When combined with <span
 style="font-family: monospace;">itemsPerPage</span> you can use this
to figure out what "page" of results you are on.</li>
</ul>
Unfortunately OpenSearch&nbsp; does not well address the paging problem
for time-based data (i.e. <span style="font-family: monospace;">If-Modified-Since</span>)
because this data is not static and thus, <span
 style="font-family: monospace;">startIndex</span> is not sufficient
for determining the next page of results, particularly when the Server
does not want to maintain state for all individual searches. Another
specification, <a
 href="http://www.ietf.org/internet-drafts/draft-nottingham-atompub-feed-history-11.txt">Feed
Paging</a>,&nbsp; addresses the problem more generically, adding links
for <span style="font-family: monospace;">next</span> and <span
 style="font-family: monospace;">previous</span> which the client can
use to chain through pages of results.
<h2><a name="atomserver"></a>The Atom Store<br>
</h2>
In the last couple of years these two ideas, the Atom Publishing
Protocol and
additional Paging Specs, that have started to show up together. The
most important
and seminal example of this being <a
 href="http://code.google.com/apis/gdata/index.html">GData</a>;
Google's API for managing data on the web. <br>
<br>
Atom already provides the capability to return an Atom<span
 style="font-weight: bold; font-family: monospace;"> entry</span> for
each dataset you choose to represent on your site, so returning a bunch
of them in the form of a feed, in response to an OpenSearch request,
isn't such a great leap.&nbsp; When we stop limiting Atom entries to
web content like weblogs, and expand APP to the management of general
data, we arrive at an <span style="font-style: italic;">Atom Store</span>;
a generic, inter-linked blob of Atom entries, which you can edit using
the APP, and then search over using OpenSearch<br>
<br>
One huge advantage of an Atom Store is that it's built on top of
REST-ful services. That means that we get the advantages of REST --
caching and uniform interfaces and hypermedia as the engine of
application state. For both OpenSearch and Atom there is a self
describing XML document that describes the capabilities of each
endpoint. That allows another service to come along and wrap several
Atom Stores together by reading those description documents and then
presenting itself as an Atom Store, an aggregate of all those stores it
uses. And this aggregate store might be a melange of your disparate
data, or on the other hand, it might be a uniform series of servers
each with a subset of a huge store, yielding, in turn, a monster
database.<br>
<h2><a name="ack"></a>Acknowledgments</h2>
Giving credit where it is due. Parts of this document were culled from
various sources; <br>
<ul>
  <li><a
 href="http://www.amazon.com/RESTful-Web-Services-Leonard-Richardson/dp/0596529260">RESTful
Web Services, by Leonard Richardson &amp; Sam Ruby</a></li>
  <li><a href="http://doc.opengarden.org/REST/Introduction_to_REST">An
Introduction to REST, by Steve Bjorg</a></li>
  <li><a href="http://www.linuxjournal.com/article/7729">At the Forge -
Aggregating with Atom, by Reuven Lerner</a> <br>
  </li>
  <li><a href="http://www.xml.com/lpt/a/1620">Dreaming of an Atom
Store: A Database for the Web, by Joe Gregorio</a></li>
</ul>
<br>
&nbsp;<br>
<br>
<br>
<br>
<br>
<br>
<br>
</div>
</body>
</html>
